智能论文笔记

Action Planning for Packing Long Linear Elastic Objects into Compact Boxes with Bimanual Robotic Manipulation

Wanyu Ma , Bin Zhang , Lijun Han , Shengzeng Huo , Hesheng Wang , David Navarro-Alarcon

分类：机器人

2021-10-22

在本文中，我们提出了一种新的动作计划方法，将长线性弹性对象自动包装到具有双层机器人系统的常用盒中。为此，我们开发了一个混合几何模型，以处理结合基于在线视觉的方法和离线参考模板的大规模遮挡。然后，引入一个参考点发生器以自动计划预先设计的动作原始基底的参考姿势。最后，一个行动计划者集成了这些组件，以实现高级行为的执行以及包装操纵任务的完成。为了验证提出的方法，我们进行了一项详细的实验研究，其中有多种类型和长度的物体和包装盒。

translated by 谷歌翻译

Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks

Zae Myung Kim , Wanyu Du , Vipul Raheja , Dhruv Kumar , Dongyeop Kang

分类：自然语言处理

2022-12-02

Iterative text revision improves text quality by fixing grammatical errors, rephrasing for better readability or contextual appropriateness, or reorganizing sentence structures throughout a document. Most recent research has focused on understanding and classifying different types of edits in the iterative revision process from human-written text instead of building accurate and robust systems for iterative text revision. In this work, we aim to build an end-to-end text revision system that can iteratively generate helpful edits by explicitly detecting editable spans (where-to-edit) with their corresponding edit intents and then instructing a revision model to revise the detected edit spans. Leveraging datasets from other related text editing NLP tasks, combined with the specification of editable spans, leads our system to more accurately model the process of iterative text refinement, as evidenced by empirical results and human evaluations. Our system significantly outperforms previous baselines on our text revision tasks and other standard text revision tasks, including grammatical error correction, text simplification, sentence fusion, and style transfer. Through extensive qualitative and quantitative analysis, we make vital connections between edit intentions and writing quality, and better computational modeling of iterative text revisions.

translated by 谷歌翻译

A Learnable Variational Model for Joint Multimodal MRI Reconstruction and Synthesis

Wanyu Bian , Qingchao Zhang , Xiaojing Ye , Yunmei Chen

分类：计算机视觉 | 机器学习

2022-04-08

产生相同解剖结构的多对比度/模态MRI丰富了诊断信息，但由于数据获取时间过多而在实践中受到限制。在本文中，我们提出了一种新的深入学习模型，用于使用几种源模态的不完整的k空间数据作为输入，用于联合重建和合成多模式MRI。我们模型的输出包括源模式的重建图像和目标模式中合成的高质量图像。我们提出的模型被公式化为一个变异问题，该问题利用了几个可学习的特定特征提取器和多模式合成模块。我们提出了一种可学习的优化算法来求解该模型，该算法可以使用多模式MRI数据训练其参数的多相网络。此外，采用了一个二线优化框架进行鲁棒参数训练。我们使用广泛的数值实验证明了方法的有效性。

translated by 谷歌翻译

Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision

Wanyu Du , Zae Myung Kim , Vipul Raheja , Dhruv Kumar , Dongyeop Kang

分类：自然语言处理

2022-04-07

修订是人类写作过程的重要组成部分。它往往是战略性的，适应性的，更重要的是迭代性质。尽管大型语言模型在文本修订任务上取得了成功，但它们仅限于非著作，单次修订。研究和评估大语言模型进行连续修订和与人类作家合作的能力是建立有效写作助手的关键一步。在这项工作中，我们提出了一个人类的迭代文本修订系统，阅读，修订，重复（R3），旨在通过阅读模型生成的修订和用户反馈，以最少的人为努力来实现高质量的文本修订，修改文件，重复人机相互作用。在R3中，文本修订模型为人类作家提供了文本编辑建议，他们可以接受或拒绝建议的编辑。然后将所接受的编辑纳入模型，以进行下次文档修订版。因此，作家可以通过与系统进行交互并仅接受/拒绝其建议的编辑来修改文档，直到文本修订模型停止进行进一步修订或达到预定义的最大修订数量。经验实验表明，R3可以在早期的修订深度与人类作家进行可比的接受率进行修订，并且人机相互作用可以通过更少的迭代和编辑来获得更高质量的修订。收集的人类模型交互数据集和系统代码可在\ url {https://github.com/vipulrraheja/iterater}中获得。我们的系统演示可在\ url {https://youtu.be/lk08tipeoae}上获得。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

P3DC-Shot: Prior-Driven Discrete Data Calibration for Nearest-Neighbor Few-Shot Classification

Shuangmei Wang , Rui Ma , Tieru Wu , Yang Cao

分类：计算机视觉

2023-01-02

Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.

translated by 谷歌翻译